On Differentially Private Longest Increasing Subsequence Computation in Data Stream
نویسندگان
چکیده
Many important applications require a continuous computation of statistics over data streams. Activities monitoring, surveillance and fraud detections are some settings where it is crucial for the monitoring applications to protect user’s sensitive information in addition to efficiently compute the required statistics. In the last two decades, a broad range of techniques for time-series and stream data monitoring has been developed to provide provable privacy guarantees employing the formal notion of differential privacy. Although these solutions are well established, they are mostly limited to count based statistics (e.g. number of distinct elements, heavy hitters) and do not apply in settings where more complex statistics are needed. In this paper, we consider a more general problem of estimating the sortedness of a data stream by privately computing the length of the longest increasing subsequence (LIS). This important statistic can be used to detect surprising trends in time-series data (e.g. finance) and perform approximate string matching in computational biology domains. Our proposed approaches employ the differential privacy notion which provides strong and provable privacy guarantees. Our solutions estimate the length of the LIS using block decomposition and local approximation techniques. We provide a rigorous analysis to bound the approximation error of our algorithms in terms of privacy level and length of the stream. Furthermore, we extend our solutions to computing the length of the LIS over sliding windows and we show the beneficial effects of this formulation on the final utility. An extensive experimental evaluation of our proposed solutions on real-world data streams demonstrates the effectiveness of our approaches for computing accurate statistics and detecting surprising trends.
منابع مشابه
Private Computation of the Longest Increasing Subsequence in Data Streams
In this paper, we study the problem of privately computing ordered statistics with the goal of monitoring sequential data streams. Despite the broad series of techniques for time-series monitoring, only few works provide provable privacy guarantees employing the formal notion of differential privacy. While these solutions are well established, their focus is mostly limited to count based statis...
متن کاملTight Lower Bounds for Multi-pass Stream Computation Via Pass Elimination
There is a natural relationship between lower bounds in the multi-pass stream model and lower bounds in multi-round communication. However, this connection is less understood than the connection between single-pass streams and one-way communication. In this paper, we consider data-stream problems for which reductions from natural multi-round communication problems do not yield tight bounds or d...
متن کاملFinding Longest Increasing and Common Subsequences in Streaming Data
In this paper, we present algorithms and lower bounds for the Longest Increasing Subsequence (LIS) and Longest Common Subsequence (LCS) problems in the data streaming model. For the problem of deciding whether the LIS of a given stream of integers drawn from {1, . . . ,m} has length at least k, we discuss a one-pass streaming algorithm using O(k log m) space, with update time either O(log k) or...
متن کاملCell-probe bounds for online edit distance and other pattern matching problems
We give cell-probe bounds for the computation of edit distance, Hamming distance, convolution and longest common subsequence in a stream. In this model, a fixed string of n symbols is given and one δ-bit symbol arrives at a time in a stream. After each symbol arrives, the distance between the fixed string and a suffix of most recent symbols of the stream is reported. The cell-probe model is per...
متن کاملA note on randomized streaming space bounds for the longest increasing subsequence problem
The deterministic space complexity of approximating the length of the longest increasing subsequence of a stream of N integers is known to be Θ̃( √ N). However, the randomized complexity is wide open. We show that the technique used in earlier work to establish the Ω( √ N) deterministic lower bound fails strongly under randomization: specifically, we show that the communication problems on which...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Transactions on Data Privacy
دوره 9 شماره
صفحات -
تاریخ انتشار 2016